makeHopeLive

Intruduction

This time, we will use Prosper loan data to do the following basic analysis, what we want to find is that what factors will impact the APR (Annuall Percentage Rate) and build one prediction model. Meanwhile, we want to let Borrower know how they can reduce their BorrowerAPR.

Prepare Data

Show the variables’ basic meaning

##                               Variable
## 1                           ListingKey
## 2                        ListingNumber
## 3                  ListingCreationDate
## 4                          CreditGrade
## 5                                 Term
## 6                           LoanStatus
## 7                           ClosedDate
## 8                          BorrowerAPR
## 9                         BorrowerRate
## 10                         LenderYield
## 11             EstimatedEffectiveYield
## 12                       EstimatedLoss
## 13                     EstimatedReturn
## 14             ProsperRating (numeric)
## 15               ProsperRating (Alpha)
## 16                        ProsperScore
## 17                     ListingCategory
## 18                       BorrowerState
## 19                          Occupation
## 20                    EmploymentStatus
## 21            EmploymentStatusDuration
## 22                 IsBorrowerHomeowner
## 23                    CurrentlyInGroup
## 24                            GroupKey
## 25                    DateCreditPulled
## 26               CreditScoreRangeLower
## 27               CreditScoreRangeUpper
## 28             FirstRecordedCreditLine
## 29                  CurrentCreditLines
## 30                     OpenCreditLines
## 31          TotalCreditLinespast7years
## 32               OpenRevolvingAccounts
## 33         OpenRevolvingMonthlyPayment
## 34                InquiriesLast6Months
## 35                      TotalInquiries
## 36                CurrentDelinquencies
## 37                    AmountDelinquent
## 38             DelinquenciesLast7Years
## 39            PublicRecordsLast10Years
## 40           PublicRecordsLast12Months
## 41              RevolvingCreditBalance
## 42                 BankcardUtilization
## 43             AvailableBankcardCredit
## 44                         TotalTrades
## 45               TradesNeverDelinquent
## 46             TradesOpenedLast6Months
## 47                   DebtToIncomeRatio
## 48                         IncomeRange
## 49                    IncomeVerifiable
## 50                 StatedMonthlyIncome
## 51                             LoanKey
## 52                   TotalProsperLoans
## 53          TotalProsperPaymentsBilled
## 54               OnTimeProsperPayments
## 55 ProsperPaymentsLessThanOneMonthLate
## 56     ProsperPaymentsOneMonthPlusLate
## 57            ProsperPrincipalBorrowed
## 58         ProsperPrincipalOutstanding
## 59         ScorexChangeAtTimeOfListing
## 60           LoanCurrentDaysDelinquent
## 61       LoanFirstDefaultedCycleNumber
## 62          LoanMonthsSinceOrigination
## 63                          LoanNumber
## 64                  LoanOriginalAmount
## 65                 LoanOriginationDate
## 66              LoanOriginationQuarter
## 67                           MemberKey
## 68                  MonthlyLoanPayment
## 69                 LP_CustomerPayments
## 70        LP_CustomerPrincipalPayments
## 71                  LP_InterestandFees
## 72                      LP_ServiceFees
## 73                   LP_CollectionFees
## 74               LP_GrossPrincipalLoss
## 75                 LP_NetPrincipalLoss
## 76     LP_NonPrincipalRecoverypayments
## 77                       PercentFunded
## 78                     Recommendations
## 79          InvestmentFromFriendsCount
## 80         InvestmentFromFriendsAmount
## 81                           Investors
##                                                                                                                                                                                                                                                                                                                                                                                                                                                            Description
## 1                                                                                                                                                                                                                                                                                                                                                                          Unique key for each listing, same value as the 'key' used in the listing object in the API.
## 2                                                                                                                                                                                                                                                                                                                                                                           The number that uniquely identifies the listing to the public as displayed on the website.
## 3                                                                                                                                                                                                                                                                                                                                                                                                                                    The date the listing was created.
## 4                                                                                                                                                                                                                                                                                                        The Credit rating that was assigned at the time the listing went live. Applicable for listings pre-2009 period and will only be populated for those listings.
## 5                                                                                                                                                                                                                                                                                                                                                                                                                          The length of the loan expressed in months.
## 6                                                                                                                                                                                                                                                                              The current status of the loan: Cancelled,  Chargedoff, Completed, Current, Defaulted, FinalPaymentInProgress, PastDue. The PastDue status will be accompanied by a delinquency bucket.
## 7                                                                                                                                                                                                                                                                                                                                                                          Closed date is applicable for Cancelled, Completed, Chargedoff and Defaulted loan statuses.
## 8                                                                                                                                                                                                                                                                                                                                                                                                            The Borrower's Annual Percentage Rate (APR) for the loan.
## 9                                                                                                                                                                                                                                                                                                                                                                                                                          The Borrower's interest rate for this loan.
## 10                                                                                                                                                                                                                                                                                                                                                        The Lender yield on the loan. Lender yield is equal to the interest rate on the loan less the servicing fee.
## 11                                                                                                                                                                                                                      Effective yield is equal to the borrower interest rate (i) minus the servicing fee rate, (ii) minus estimated uncollected interest on charge-offs, (iii) plus estimated collected late fees.  Applicable for loans originated after July 2009.
## 12                                                                                                                                                                                                                                                                                                                                                     Estimated loss is the estimated principal loss on charge-offs. Applicable for loans originated after July 2009.
## 13                                                                                                                                                                                                                                     The estimated return assigned to the listing at the time it was created. Estimated return is the difference between the Estimated Effective Yield and the Estimated Loss Rate. Applicable for loans originated after July 2009.
## 14                                                                                                                                                                                                                                                                                     The  Prosper Rating assigned at the time the listing was created: 0 - N/A, 1 - HR, 2 - E, 3 - D, 4 - C, 5 - B, 6 - A, 7 - AA.  Applicable for loans originated after July 2009.
## 15                                                                                                                                                                                                                                                                                                                                  The Prosper Rating assigned at the time the listing was created between AA - HR.  Applicable for loans originated after July 2009.
## 16                                                                                                                                                                                                                                                                                A custom risk score built using historical Prosper data. The score ranges from 1-10, with 10 being the best, or lowest risk score.  Applicable for loans originated after July 2009.
## 17 The category of the listing that the borrower selected when posting their listing: 0 - Not Available, 1 - Debt Consolidation, 2 - Home Improvement, 3 - Business, 4 - Personal Loan, 5 - Student Use, 6 - Auto, 7- Other, 8 - Baby&Adoption, 9 - Boat, 10 - Cosmetic Procedure, 11 - Engagement Ring, 12 - Green Loans, 13 - Household Expenses, 14 - Large Purchases, 15 - Medical/Dental, 16 - Motorcycle, 17 - RV, 18 - Taxes, 19 - Vacation, 20 - Wedding Loans
## 18                                                                                                                                                                                                                                                                                                                                                        The two letter abbreviation of the state of the address of the borrower at the time the Listing was created.
## 19                                                                                                                                                                                                                                                                                                                                                                                       The Occupation selected by the Borrower at the time they created the listing.
## 20                                                                                                                                                                                                                                                                                                                                                                                          The employment status of the borrower at the time they posted the listing.
## 21                                                                                                                                                                                                                                                                                                                                                                                  The length in months of the employment status at the time the listing was created.
## 22                                                                                                                                                                                                                                                                                                               A Borrower will be classified as a homowner if they have a mortgage on their credit profile or provide documentation confirming they are a homeowner.
## 23                                                                                                                                                                                                                                                                                                                                                                           Specifies whether or not the Borrower was in a group at the time the listing was created.
## 24                                                                                                                                                                                                                                                                                                                                    The Key of the group in which the Borrower is a member of. Value will be null if the borrower does not have a group affiliation.
## 25                                                                                                                                                                                                                                                                                                                                                                                                                             The date the credit profile was pulled.
## 26                                                                                                                                                                                                                                                                                                                                               The lower value representing the range of the borrower's credit score as provided by a consumer credit rating agency.
## 27                                                                                                                                                                                                                                                                                                                                               The upper value representing the range of the borrower's credit score as provided by a consumer credit rating agency.
## 28                                                                                                                                                                                                                                                                                                                                                                                                                          The date the first credit line was opened.
## 29                                                                                                                                                                                                                                                                                                                                                                                           Number of current credit lines at the time the credit profile was pulled.
## 30                                                                                                                                                                                                                                                                                                                                                                                              Number of open credit lines at the time the credit profile was pulled.
## 31                                                                                                                                                                                                                                                                                                                                                                           Number of credit lines in the past seven years at the time the credit profile was pulled.
## 32                                                                                                                                                                                                                                                                                                                                                                                        Number of open revolving accounts at the time the credit profile was pulled.
## 33                                                                                                                                                                                                                                                                                                                                                                                    Monthly payment on revolving accounts at the time the credit profile was pulled.
## 34                                                                                                                                                                                                                                                                                                                                                                               Number of inquiries in the past six months at the time the credit profile was pulled.
## 35                                                                                                                                                                                                                                                                                                                                                                                                Total number of inquiries at the time the credit profile was pulled.
## 36                                                                                                                                                                                                                                                                                                                                                                                            Number of accounts delinquent at the time the credit profile was pulled.
## 37                                                                                                                                                                                                                                                                                                                                                                                                       Dollars delinquent at the time the credit profile was pulled.
## 38                                                                                                                                                                                                                                                                                                                                                                              Number of delinquencies in the past 7 years at the time the credit profile was pulled.
## 39                                                                                                                                                                                                                                                                                                                                                                            Number of public records in the past 10 years at the time the credit profile was pulled.
## 40                                                                                                                                                                                                                                                                                                                                                                           Number of public records in the past 12 months at the time the credit profile was pulled.
## 41                                                                                                                                                                                                                                                                                                                                                                                              Dollars of revolving credit at the time the credit profile was pulled.
## 42                                                                                                                                                                                                                                                                                                                                                            The percentage of available revolving credit that is utilized at the time the credit profile was pulled.
## 43                                                                                                                                                                                                                                                                                                                                                                                 The total available credit via bank card at the time the credit profile was pulled.
## 44                                                                                                                                                                                                                                                                                                                                                                                        Number of trade lines ever opened at the time the credit profile was pulled.
## 45                                                                                                                                                                                                                                                                                                                                                                         Number of trades that have never been delinquent at the time the credit profile was pulled.
## 46                                                                                                                                                                                                                                                                                                                                                                             Number of trades opened in the last 6 months at the time the credit profile was pulled.
## 47                                                                                                                                                                                                          The debt to income ratio of the borrower at the time the credit profile was pulled. This value is Null if the debt to income ratio is not available. This value is capped at 10.01 (any debt to income ratio larger than 1000% will be returned as 1001%).
## 48                                                                                                                                                                                                                                                                                                                                                                                               The income range of the borrower at the time the listing was created.
## 49                                                                                                                                                                                                                                                                                                                                                                                The borrower indicated they have the required documentation to support their income.
## 50                                                                                                                                                                                                                                                                                                                                                                                         The monthly income the borrower stated at the time the listing was created.
## 51                                                                                                                                                                                                                                                                                                                                                                                             Unique key for each loan. This is the same key that is used in the API.
## 52                                                                                                                                                                                                                                                                                                                             Number of Prosper loans the borrower at the time they created this listing. This value will be null if the borrower had no prior loans.
## 53                                                                                                                                                                                                                                                                                                    Number of on time payments the borrower made on Prosper loans at the time they created this listing. This value will be null if the borrower had no prior loans.
## 54                                                                                                                                                                                                                                                                                                Number of on time payments the borrower had made on Prosper loans at the time they created this listing. This value will be null if the borrower has no prior loans.
## 55                                                                                                                                                                                                                                                                         Number of payments the borrower made on Prosper loans that were less than one month late at the time they created this listing. This value will be null if the borrower had no prior loans.
## 56                                                                                                                                                                                                                                                                      Number of payments the borrower made on Prosper loans that were greater than one month late at the time they created this listing. This value will be null if the borrower had no prior loans.
## 57                                                                                                                                                                                                                                                                                                                          Total principal borrowed on Prosper loans at the time the listing was created. This value will be null if the borrower had no prior loans.
## 58                                                                                                                                                                                                                                                                                                                             Principal outstanding on Prosper loans at the time the listing was created. This value will be null if the borrower had no prior loans.
## 59                                                                                                                                                                                                                                                         Borrower's credit score change at the time the credit profile was pulled. This will be the change relative to the borrower's last Prosper loan. This value will be null if the borrower had no prior loans.
## 60                                                                                                                                                                                                                                                                                                                                                                                                                                      The number of days delinquent.
## 61                                                                                                                                                                                                                                                                                                                                                                         The cycle the loan was charged off. If the loan has not charged off the value will be null.
## 62                                                                                                                                                                                                                                                                                                                                                                                                                         Number of months since the loan originated.
## 63                                                                                                                                                                                                                                                                                                                                                                                                                      Unique numeric value associated with the loan.
## 64                                                                                                                                                                                                                                                                                                                                                                                                                                 The origination amount of the loan.
## 65                                                                                                                                                                                                                                                                                                                                                                                                                                   The date the loan was originated.
## 66                                                                                                                                                                                                                                                                                                                                                                                                                       The quarter in which the loan was originated.
## 67                                                                                                                                                                                                                                                                                                                                             The unique key that is associated with the borrower. This is the same identifier that is used in the API member object.
## 68                                                                                                                                                                                                                                                                                                                                                                                                                                 The scheduled monthly loan payment.
## 69                                                                                                                                                                                                                                                                                                                     Pre charge-off cumulative gross payments made by the borrower on the loan. If the loan has charged off, this value will exclude any recoveries.
## 70                                                                                                                                                                                                                                                                                                                 Pre charge-off cumulative principal payments made by the borrower on the loan. If the loan has charged off, this value will exclude any recoveries.
## 71                                                                                                                                                                                                                                                                                                                              Pre charge-off cumulative interest and fees paid by the borrower. If the loan has charged off, this value will exclude any recoveries.
## 72                                                                                                                                                                                                                                                                                                                                                                                        Cumulative service fees paid by the investors who have invested in the loan.
## 73                                                                                                                                                                                                                                                                                                                                                                                     Cumulative collection fees paid by the investors who have invested in the loan.
## 74                                                                                                                                                                                                                                                                                                                                                                                                                           The gross charged off amount of the loan.
## 75                                                                                                                                                                                                                                                                                                                                                                                                        The principal that remains uncollected after any recoveries.
## 76                                                                                                                                                                                                                                                                                                             The interest and fee component of any recovery payments. The current payment policy applies payments in the following order: Fees, interest, principal.
## 77                                                                                                                                                                                                                                                                                                                                                                                                                                     Percent the listing was funded.
## 78                                                                                                                                                                                                                                                                                                                                                                                     Number of recommendations the borrower had at the time the listing was created.
## 79                                                                                                                                                                                                                                                                                                                                                                                                              Number of friends that made an investment in the loan.
## 80                                                                                                                                                                                                                                                                                                                                                                                                             Dollar amount of investments that were made by friends.
## 81                                                                                                                                                                                                                                                                                                                                                                                                                       The number of investors that funded the loan.

Show the structure of the data set.

## 'data.frame':    113937 obs. of  81 variables:
##  $ ListingKey                         : Factor w/ 113066 levels "00003546482094282EF90E5",..: 7180 7193 6647 6669 6686 6689 6699 6706 6687 6687 ...
##  $ ListingNumber                      : int  193129 1209647 81716 658116 909464 1074836 750899 768193 1023355 1023355 ...
##  $ ListingCreationDate                : Factor w/ 113064 levels "2005-11-09 20:44:28.847000000",..: 14184 111894 6429 64760 85967 100310 72556 74019 97834 97834 ...
##  $ CreditGrade                        : Factor w/ 9 levels "","A","AA","B",..: 5 1 8 1 1 1 1 1 1 1 ...
##  $ Term                               : int  36 36 36 36 36 60 36 36 36 36 ...
##  $ LoanStatus                         : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 4 3 4 4 4 4 4 4 4 ...
##  $ ClosedDate                         : Factor w/ 2803 levels "","2005-11-25 00:00:00",..: 1138 1 1263 1 1 1 1 1 1 1 ...
##  $ BorrowerAPR                        : num  0.165 0.12 0.283 0.125 0.246 ...
##  $ BorrowerRate                       : num  0.158 0.092 0.275 0.0974 0.2085 ...
##  $ LenderYield                        : num  0.138 0.082 0.24 0.0874 0.1985 ...
##  $ EstimatedEffectiveYield            : num  NA 0.0796 NA 0.0849 0.1832 ...
##  $ EstimatedLoss                      : num  NA 0.0249 NA 0.0249 0.0925 ...
##  $ EstimatedReturn                    : num  NA 0.0547 NA 0.06 0.0907 ...
##  $ ProsperRating..numeric.            : int  NA 6 NA 6 3 5 2 4 7 7 ...
##  $ ProsperRating..Alpha.              : Factor w/ 8 levels "","A","AA","B",..: 1 2 1 2 6 4 7 5 3 3 ...
##  $ ProsperScore                       : num  NA 7 NA 9 4 10 2 4 9 11 ...
##  $ ListingCategory..numeric.          : int  0 2 0 16 2 1 1 2 7 7 ...
##  $ BorrowerState                      : Factor w/ 52 levels "","AK","AL","AR",..: 7 7 12 12 25 34 18 6 16 16 ...
##  $ Occupation                         : Factor w/ 68 levels "","Accountant/CPA",..: 37 43 37 52 21 43 50 29 24 24 ...
##  $ EmploymentStatus                   : Factor w/ 9 levels "","Employed",..: 9 2 4 2 2 2 2 2 2 2 ...
##  $ EmploymentStatusDuration           : int  2 44 NA 113 44 82 172 103 269 269 ...
##  $ IsBorrowerHomeowner                : Factor w/ 2 levels "False","True": 2 1 1 2 2 2 1 1 2 2 ...
##  $ CurrentlyInGroup                   : Factor w/ 2 levels "False","True": 2 1 2 1 1 1 1 1 1 1 ...
##  $ GroupKey                           : Factor w/ 707 levels "","00343376901312423168731",..: 1 1 335 1 1 1 1 1 1 1 ...
##  $ DateCreditPulled                   : Factor w/ 112992 levels "2005-11-09 00:30:04.487000000",..: 14347 111883 6446 64724 85857 100382 72500 73937 97888 97888 ...
##  $ CreditScoreRangeLower              : int  640 680 480 800 680 740 680 700 820 820 ...
##  $ CreditScoreRangeUpper              : int  659 699 499 819 699 759 699 719 839 839 ...
##  $ FirstRecordedCreditLine            : Factor w/ 11586 levels "","1947-08-24 00:00:00",..: 8639 6617 8927 2247 9498 497 8265 7685 5543 5543 ...
##  $ CurrentCreditLines                 : int  5 14 NA 5 19 21 10 6 17 17 ...
##  $ OpenCreditLines                    : int  4 14 NA 5 19 17 7 6 16 16 ...
##  $ TotalCreditLinespast7years         : int  12 29 3 29 49 49 20 10 32 32 ...
##  $ OpenRevolvingAccounts              : int  1 13 0 7 6 13 6 5 12 12 ...
##  $ OpenRevolvingMonthlyPayment        : num  24 389 0 115 220 1410 214 101 219 219 ...
##  $ InquiriesLast6Months               : int  3 3 0 0 1 0 0 3 1 1 ...
##  $ TotalInquiries                     : num  3 5 1 1 9 2 0 16 6 6 ...
##  $ CurrentDelinquencies               : int  2 0 1 4 0 0 0 0 0 0 ...
##  $ AmountDelinquent                   : num  472 0 NA 10056 0 ...
##  $ DelinquenciesLast7Years            : int  4 0 0 14 0 0 0 0 0 0 ...
##  $ PublicRecordsLast10Years           : int  0 1 0 0 0 0 0 1 0 0 ...
##  $ PublicRecordsLast12Months          : int  0 0 NA 0 0 0 0 0 0 0 ...
##  $ RevolvingCreditBalance             : num  0 3989 NA 1444 6193 ...
##  $ BankcardUtilization                : num  0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
##  $ AvailableBankcardCredit            : num  1500 10266 NA 30754 695 ...
##  $ TotalTrades                        : num  11 29 NA 26 39 47 16 10 29 29 ...
##  $ TradesNeverDelinquent..percentage. : num  0.81 1 NA 0.76 0.95 1 0.68 0.8 1 1 ...
##  $ TradesOpenedLast6Months            : num  0 2 NA 0 2 0 0 0 1 1 ...
##  $ DebtToIncomeRatio                  : num  0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
##  $ IncomeRange                        : Factor w/ 8 levels "$0","$1-24,999",..: 4 5 7 4 3 3 4 4 4 4 ...
##  $ IncomeVerifiable                   : Factor w/ 2 levels "False","True": 2 2 2 2 2 2 2 2 2 2 ...
##  $ StatedMonthlyIncome                : num  3083 6125 2083 2875 9583 ...
##  $ LoanKey                            : Factor w/ 113066 levels "00003683605746079487FF7",..: 100337 69837 46303 70776 71387 86505 91250 5425 908 908 ...
##  $ TotalProsperLoans                  : int  NA NA NA NA 1 NA NA NA NA NA ...
##  $ TotalProsperPaymentsBilled         : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ OnTimeProsperPayments              : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ ProsperPaymentsLessThanOneMonthLate: int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPaymentsOneMonthPlusLate    : int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPrincipalBorrowed           : num  NA NA NA NA 11000 NA NA NA NA NA ...
##  $ ProsperPrincipalOutstanding        : num  NA NA NA NA 9948 ...
##  $ ScorexChangeAtTimeOfListing        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanCurrentDaysDelinquent          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ LoanFirstDefaultedCycleNumber      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanMonthsSinceOrigination         : int  78 0 86 16 6 3 11 10 3 3 ...
##  $ LoanNumber                         : int  19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
##  $ LoanOriginalAmount                 : int  9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
##  $ LoanOriginationDate                : Factor w/ 1873 levels "2005-11-15 00:00:00",..: 426 1866 260 1535 1757 1821 1649 1666 1813 1813 ...
##  $ LoanOriginationQuarter             : Factor w/ 33 levels "Q1 2006","Q1 2007",..: 18 8 2 32 24 33 16 16 33 33 ...
##  $ MemberKey                          : Factor w/ 90831 levels "00003397697413387CAF966",..: 11071 10302 33781 54939 19465 48037 60448 40951 26129 26129 ...
##  $ MonthlyLoanPayment                 : num  330 319 123 321 564 ...
##  $ LP_CustomerPayments                : num  11396 0 4187 5143 2820 ...
##  $ LP_CustomerPrincipalPayments       : num  9425 0 3001 4091 1563 ...
##  $ LP_InterestandFees                 : num  1971 0 1186 1052 1257 ...
##  $ LP_ServiceFees                     : num  -133.2 0 -24.2 -108 -60.3 ...
##  $ LP_CollectionFees                  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_GrossPrincipalLoss              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NetPrincipalLoss                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NonPrincipalRecoverypayments    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ PercentFunded                      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Recommendations                    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsCount         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsAmount        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Investors                          : int  258 1 41 158 20 1 1 1 1 1 ...

Wow, 81 variables, since not familar with loan data features, need to understand them with the following exploration and web searching. 113937 obs, not small.

Show the summury of the data.

##                    ListingKey     ListingNumber    
##  17A93590655669644DB4C06:     6   Min.   :      4  
##  349D3587495831350F0F648:     4   1st Qu.: 400919  
##  47C1359638497431975670B:     4   Median : 600554  
##  8474358854651984137201C:     4   Mean   : 627886  
##  DE8535960513435199406CE:     4   3rd Qu.: 892634  
##  04C13599434217079754AEE:     3   Max.   :1255725  
##  (Other)                :113912                    
##                     ListingCreationDate  CreditGrade         Term      
##  2013-10-02 17:20:16.550000000:     6          :84984   Min.   :12.00  
##  2013-08-28 20:31:41.107000000:     4   C      : 5649   1st Qu.:36.00  
##  2013-09-08 09:27:44.853000000:     4   D      : 5153   Median :36.00  
##  2013-12-06 05:43:13.830000000:     4   B      : 4389   Mean   :40.83  
##  2013-12-06 11:44:58.283000000:     4   AA     : 3509   3rd Qu.:36.00  
##  2013-08-21 07:25:22.360000000:     3   HR     : 3508   Max.   :60.00  
##  (Other)                      :113912   (Other): 6745                  
##                  LoanStatus                  ClosedDate   
##  Current              :56576                      :58848  
##  Completed            :38074   2014-03-04 00:00:00:  105  
##  Chargedoff           :11992   2014-02-19 00:00:00:  100  
##  Defaulted            : 5018   2014-02-11 00:00:00:   92  
##  Past Due (1-15 days) :  806   2012-10-30 00:00:00:   81  
##  Past Due (31-60 days):  363   2013-02-26 00:00:00:   78  
##  (Other)              : 1108   (Other)            :54633  
##   BorrowerAPR       BorrowerRate     LenderYield     
##  Min.   :0.00653   Min.   :0.0000   Min.   :-0.0100  
##  1st Qu.:0.15629   1st Qu.:0.1340   1st Qu.: 0.1242  
##  Median :0.20976   Median :0.1840   Median : 0.1730  
##  Mean   :0.21883   Mean   :0.1928   Mean   : 0.1827  
##  3rd Qu.:0.28381   3rd Qu.:0.2500   3rd Qu.: 0.2400  
##  Max.   :0.51229   Max.   :0.4975   Max.   : 0.4925  
##  NA's   :25                                          
##  EstimatedEffectiveYield EstimatedLoss   EstimatedReturn 
##  Min.   :-0.183          Min.   :0.005   Min.   :-0.183  
##  1st Qu.: 0.116          1st Qu.:0.042   1st Qu.: 0.074  
##  Median : 0.162          Median :0.072   Median : 0.092  
##  Mean   : 0.169          Mean   :0.080   Mean   : 0.096  
##  3rd Qu.: 0.224          3rd Qu.:0.112   3rd Qu.: 0.117  
##  Max.   : 0.320          Max.   :0.366   Max.   : 0.284  
##  NA's   :29084           NA's   :29084   NA's   :29084   
##  ProsperRating..numeric. ProsperRating..Alpha.  ProsperScore  
##  Min.   :1.000                  :29084         Min.   : 1.00  
##  1st Qu.:3.000           C      :18345         1st Qu.: 4.00  
##  Median :4.000           B      :15581         Median : 6.00  
##  Mean   :4.072           A      :14551         Mean   : 5.95  
##  3rd Qu.:5.000           D      :14274         3rd Qu.: 8.00  
##  Max.   :7.000           E      : 9795         Max.   :11.00  
##  NA's   :29084           (Other):12307         NA's   :29084  
##  ListingCategory..numeric. BorrowerState  
##  Min.   : 0.000            CA     :14717  
##  1st Qu.: 1.000            TX     : 6842  
##  Median : 1.000            NY     : 6729  
##  Mean   : 2.774            FL     : 6720  
##  3rd Qu.: 3.000            IL     : 5921  
##  Max.   :20.000                   : 5515  
##                            (Other):67493  
##                     Occupation         EmploymentStatus
##  Other                   :28617   Employed     :67322  
##  Professional            :13628   Full-time    :26355  
##  Computer Programmer     : 4478   Self-employed: 6134  
##  Executive               : 4311   Not available: 5347  
##  Teacher                 : 3759   Other        : 3806  
##  Administrative Assistant: 3688                : 2255  
##  (Other)                 :55456   (Other)      : 2718  
##  EmploymentStatusDuration IsBorrowerHomeowner CurrentlyInGroup
##  Min.   :  0.00           False:56459         False:101218    
##  1st Qu.: 26.00           True :57478         True : 12719    
##  Median : 67.00                                               
##  Mean   : 96.07                                               
##  3rd Qu.:137.00                                               
##  Max.   :755.00                                               
##  NA's   :7625                                                 
##                     GroupKey                 DateCreditPulled 
##                         :100596   2013-12-23 09:38:12:     6  
##  783C3371218786870A73D20:  1140   2013-11-21 09:09:41:     4  
##  3D4D3366260257624AB272D:   916   2013-12-06 05:43:16:     4  
##  6A3B336601725506917317E:   698   2014-01-14 20:17:49:     4  
##  FEF83377364176536637E50:   611   2014-02-09 12:14:41:     4  
##  C9643379247860156A00EC0:   342   2013-09-27 22:04:54:     3  
##  (Other)                :  9634   (Other)            :113912  
##  CreditScoreRangeLower CreditScoreRangeUpper
##  Min.   :  0.0         Min.   : 19.0        
##  1st Qu.:660.0         1st Qu.:679.0        
##  Median :680.0         Median :699.0        
##  Mean   :685.6         Mean   :704.6        
##  3rd Qu.:720.0         3rd Qu.:739.0        
##  Max.   :880.0         Max.   :899.0        
##  NA's   :591           NA's   :591          
##         FirstRecordedCreditLine CurrentCreditLines OpenCreditLines
##                     :   697     Min.   : 0.00      Min.   : 0.00  
##  1993-12-01 00:00:00:   185     1st Qu.: 7.00      1st Qu.: 6.00  
##  1994-11-01 00:00:00:   178     Median :10.00      Median : 9.00  
##  1995-11-01 00:00:00:   168     Mean   :10.32      Mean   : 9.26  
##  1990-04-01 00:00:00:   161     3rd Qu.:13.00      3rd Qu.:12.00  
##  1995-03-01 00:00:00:   159     Max.   :59.00      Max.   :54.00  
##  (Other)            :112389     NA's   :7604       NA's   :7604   
##  TotalCreditLinespast7years OpenRevolvingAccounts
##  Min.   :  2.00             Min.   : 0.00        
##  1st Qu.: 17.00             1st Qu.: 4.00        
##  Median : 25.00             Median : 6.00        
##  Mean   : 26.75             Mean   : 6.97        
##  3rd Qu.: 35.00             3rd Qu.: 9.00        
##  Max.   :136.00             Max.   :51.00        
##  NA's   :697                                     
##  OpenRevolvingMonthlyPayment InquiriesLast6Months TotalInquiries   
##  Min.   :    0.0             Min.   :  0.000      Min.   :  0.000  
##  1st Qu.:  114.0             1st Qu.:  0.000      1st Qu.:  2.000  
##  Median :  271.0             Median :  1.000      Median :  4.000  
##  Mean   :  398.3             Mean   :  1.435      Mean   :  5.584  
##  3rd Qu.:  525.0             3rd Qu.:  2.000      3rd Qu.:  7.000  
##  Max.   :14985.0             Max.   :105.000      Max.   :379.000  
##                              NA's   :697          NA's   :1159     
##  CurrentDelinquencies AmountDelinquent   DelinquenciesLast7Years
##  Min.   : 0.0000      Min.   :     0.0   Min.   : 0.000         
##  1st Qu.: 0.0000      1st Qu.:     0.0   1st Qu.: 0.000         
##  Median : 0.0000      Median :     0.0   Median : 0.000         
##  Mean   : 0.5921      Mean   :   984.5   Mean   : 4.155         
##  3rd Qu.: 0.0000      3rd Qu.:     0.0   3rd Qu.: 3.000         
##  Max.   :83.0000      Max.   :463881.0   Max.   :99.000         
##  NA's   :697          NA's   :7622       NA's   :990            
##  PublicRecordsLast10Years PublicRecordsLast12Months RevolvingCreditBalance
##  Min.   : 0.0000          Min.   : 0.000            Min.   :      0       
##  1st Qu.: 0.0000          1st Qu.: 0.000            1st Qu.:   3121       
##  Median : 0.0000          Median : 0.000            Median :   8549       
##  Mean   : 0.3126          Mean   : 0.015            Mean   :  17599       
##  3rd Qu.: 0.0000          3rd Qu.: 0.000            3rd Qu.:  19521       
##  Max.   :38.0000          Max.   :20.000            Max.   :1435667       
##  NA's   :697              NA's   :7604              NA's   :7604          
##  BankcardUtilization AvailableBankcardCredit  TotalTrades    
##  Min.   :0.000       Min.   :     0          Min.   :  0.00  
##  1st Qu.:0.310       1st Qu.:   880          1st Qu.: 15.00  
##  Median :0.600       Median :  4100          Median : 22.00  
##  Mean   :0.561       Mean   : 11210          Mean   : 23.23  
##  3rd Qu.:0.840       3rd Qu.: 13180          3rd Qu.: 30.00  
##  Max.   :5.950       Max.   :646285          Max.   :126.00  
##  NA's   :7604        NA's   :7544            NA's   :7544    
##  TradesNeverDelinquent..percentage. TradesOpenedLast6Months
##  Min.   :0.000                      Min.   : 0.000         
##  1st Qu.:0.820                      1st Qu.: 0.000         
##  Median :0.940                      Median : 0.000         
##  Mean   :0.886                      Mean   : 0.802         
##  3rd Qu.:1.000                      3rd Qu.: 1.000         
##  Max.   :1.000                      Max.   :20.000         
##  NA's   :7544                       NA's   :7544           
##  DebtToIncomeRatio         IncomeRange    IncomeVerifiable
##  Min.   : 0.000    $25,000-49,999:32192   False:  8669    
##  1st Qu.: 0.140    $50,000-74,999:31050   True :105268    
##  Median : 0.220    $100,000+     :17337                   
##  Mean   : 0.276    $75,000-99,999:16916                   
##  3rd Qu.: 0.320    Not displayed : 7741                   
##  Max.   :10.010    $1-24,999     : 7274                   
##  NA's   :8554      (Other)       : 1427                   
##  StatedMonthlyIncome                    LoanKey       TotalProsperLoans
##  Min.   :      0     CB1B37030986463208432A1:     6   Min.   :0.00     
##  1st Qu.:   3200     2DEE3698211017519D7333F:     4   1st Qu.:1.00     
##  Median :   4667     9F4B37043517554537C364C:     4   Median :1.00     
##  Mean   :   5608     D895370150591392337ED6D:     4   Mean   :1.42     
##  3rd Qu.:   6825     E6FB37073953690388BC56D:     4   3rd Qu.:2.00     
##  Max.   :1750003     0D8F37036734373301ED419:     3   Max.   :8.00     
##                      (Other)                :113912   NA's   :91852    
##  TotalProsperPaymentsBilled OnTimeProsperPayments
##  Min.   :  0.00             Min.   :  0.00       
##  1st Qu.:  9.00             1st Qu.:  9.00       
##  Median : 16.00             Median : 15.00       
##  Mean   : 22.93             Mean   : 22.27       
##  3rd Qu.: 33.00             3rd Qu.: 32.00       
##  Max.   :141.00             Max.   :141.00       
##  NA's   :91852              NA's   :91852        
##  ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
##  Min.   : 0.00                       Min.   : 0.00                  
##  1st Qu.: 0.00                       1st Qu.: 0.00                  
##  Median : 0.00                       Median : 0.00                  
##  Mean   : 0.61                       Mean   : 0.05                  
##  3rd Qu.: 0.00                       3rd Qu.: 0.00                  
##  Max.   :42.00                       Max.   :21.00                  
##  NA's   :91852                       NA's   :91852                  
##  ProsperPrincipalBorrowed ProsperPrincipalOutstanding
##  Min.   :    0            Min.   :    0              
##  1st Qu.: 3500            1st Qu.:    0              
##  Median : 6000            Median : 1627              
##  Mean   : 8472            Mean   : 2930              
##  3rd Qu.:11000            3rd Qu.: 4127              
##  Max.   :72499            Max.   :23451              
##  NA's   :91852            NA's   :91852              
##  ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
##  Min.   :-209.00             Min.   :   0.0           
##  1st Qu.: -35.00             1st Qu.:   0.0           
##  Median :  -3.00             Median :   0.0           
##  Mean   :  -3.22             Mean   : 152.8           
##  3rd Qu.:  25.00             3rd Qu.:   0.0           
##  Max.   : 286.00             Max.   :2704.0           
##  NA's   :95009                                        
##  LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination   LoanNumber    
##  Min.   : 0.00                 Min.   :  0.0              Min.   :     1  
##  1st Qu.: 9.00                 1st Qu.:  6.0              1st Qu.: 37332  
##  Median :14.00                 Median : 21.0              Median : 68599  
##  Mean   :16.27                 Mean   : 31.9              Mean   : 69444  
##  3rd Qu.:22.00                 3rd Qu.: 65.0              3rd Qu.:101901  
##  Max.   :44.00                 Max.   :100.0              Max.   :136486  
##  NA's   :96985                                                            
##  LoanOriginalAmount          LoanOriginationDate LoanOriginationQuarter
##  Min.   : 1000      2014-01-22 00:00:00:   491   Q4 2013:14450         
##  1st Qu.: 4000      2013-11-13 00:00:00:   490   Q1 2014:12172         
##  Median : 6500      2014-02-19 00:00:00:   439   Q3 2013: 9180         
##  Mean   : 8337      2013-10-16 00:00:00:   434   Q2 2013: 7099         
##  3rd Qu.:12000      2014-01-28 00:00:00:   339   Q3 2012: 5632         
##  Max.   :35000      2013-09-24 00:00:00:   316   Q2 2012: 5061         
##                     (Other)            :111428   (Other):60343         
##                    MemberKey      MonthlyLoanPayment LP_CustomerPayments
##  63CA34120866140639431C9:     9   Min.   :   0.0     Min.   :   -2.35   
##  16083364744933457E57FB9:     8   1st Qu.: 131.6     1st Qu.: 1005.76   
##  3A2F3380477699707C81385:     8   Median : 217.7     Median : 2583.83   
##  4D9C3403302047712AD0CDD:     8   Mean   : 272.5     Mean   : 4183.08   
##  739C338135235294782AE75:     8   3rd Qu.: 371.6     3rd Qu.: 5548.40   
##  7E1733653050264822FAA3D:     8   Max.   :2251.5     Max.   :40702.39   
##  (Other)                :113888                                         
##  LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees   
##  Min.   :    0.0              Min.   :   -2.35   Min.   :-664.87  
##  1st Qu.:  500.9              1st Qu.:  274.87   1st Qu.: -73.18  
##  Median : 1587.5              Median :  700.84   Median : -34.44  
##  Mean   : 3105.5              Mean   : 1077.54   Mean   : -54.73  
##  3rd Qu.: 4000.0              3rd Qu.: 1458.54   3rd Qu.: -13.92  
##  Max.   :35000.0              Max.   :15617.03   Max.   :  32.06  
##                                                                   
##  LP_CollectionFees  LP_GrossPrincipalLoss LP_NetPrincipalLoss
##  Min.   :-9274.75   Min.   :  -94.2       Min.   : -954.5    
##  1st Qu.:    0.00   1st Qu.:    0.0       1st Qu.:    0.0    
##  Median :    0.00   Median :    0.0       Median :    0.0    
##  Mean   :  -14.24   Mean   :  700.4       Mean   :  681.4    
##  3rd Qu.:    0.00   3rd Qu.:    0.0       3rd Qu.:    0.0    
##  Max.   :    0.00   Max.   :25000.0       Max.   :25000.0    
##                                                              
##  LP_NonPrincipalRecoverypayments PercentFunded    Recommendations   
##  Min.   :    0.00                Min.   :0.7000   Min.   : 0.00000  
##  1st Qu.:    0.00                1st Qu.:1.0000   1st Qu.: 0.00000  
##  Median :    0.00                Median :1.0000   Median : 0.00000  
##  Mean   :   25.14                Mean   :0.9986   Mean   : 0.04803  
##  3rd Qu.:    0.00                3rd Qu.:1.0000   3rd Qu.: 0.00000  
##  Max.   :21117.90                Max.   :1.0125   Max.   :39.00000  
##                                                                     
##  InvestmentFromFriendsCount InvestmentFromFriendsAmount   Investors      
##  Min.   : 0.00000           Min.   :    0.00            Min.   :   1.00  
##  1st Qu.: 0.00000           1st Qu.:    0.00            1st Qu.:   2.00  
##  Median : 0.00000           Median :    0.00            Median :  44.00  
##  Mean   : 0.02346           Mean   :   16.55            Mean   :  80.48  
##  3rd Qu.: 0.00000           3rd Qu.:    0.00            3rd Qu.: 115.00  
##  Max.   :33.00000           Max.   :25000.00            Max.   :1189.00  
## 

Why so many NA values? Part of them have the same NA count, what causes the NA value? Confused. The information should be calculated automatically, e.g., EstimatedEffectiveYield.

Why duplicate ListingKey? Subset the duplicate ListingKey data for example.

##                    ListingKey ListingNumber           ListingCreationDate
## 13079 17A93590655669644DB4C06        951186 2013-10-02 17:20:16.550000000
## 14889 17A93590655669644DB4C06        951186 2013-10-02 17:20:16.550000000
## 20570 17A93590655669644DB4C06        951186 2013-10-02 17:20:16.550000000
## 31451 17A93590655669644DB4C06        951186 2013-10-02 17:20:16.550000000
## 42751 17A93590655669644DB4C06        951186 2013-10-02 17:20:16.550000000
## 42752 17A93590655669644DB4C06        951186 2013-10-02 17:20:16.550000000
##       CreditGrade Term LoanStatus ClosedDate BorrowerAPR BorrowerRate
## 13079               60    Current                0.16662       0.1435
## 14889               60    Current                0.16662       0.1435
## 20570               60    Current                0.16662       0.1435
## 31451               60    Current                0.16662       0.1435
## 42751               60    Current                0.16662       0.1435
## 42752               60    Current                0.16662       0.1435
##       LenderYield EstimatedEffectiveYield EstimatedLoss EstimatedReturn
## 13079      0.1335                  0.1264        0.0524           0.074
## 14889      0.1335                  0.1264        0.0524           0.074
## 20570      0.1335                  0.1264        0.0524           0.074
## 31451      0.1335                  0.1264        0.0524           0.074
## 42751      0.1335                  0.1264        0.0524           0.074
## 42752      0.1335                  0.1264        0.0524           0.074
##       ProsperRating..numeric. ProsperRating..Alpha. ProsperScore
## 13079                       5                     B            4
## 14889                       5                     B            8
## 20570                       5                     B            7
## 31451                       5                     B           10
## 42751                       5                     B            5
## 42752                       5                     B            6
##       ListingCategory..numeric. BorrowerState Occupation EmploymentStatus
## 13079                         1            MD      Other         Employed
## 14889                         1            MD      Other         Employed
## 20570                         1            MD      Other         Employed
## 31451                         1            MD      Other         Employed
## 42751                         1            MD      Other         Employed
## 42752                         1            MD      Other         Employed
##       EmploymentStatusDuration IsBorrowerHomeowner CurrentlyInGroup
## 13079                       26               False            False
## 14889                       26               False            False
## 20570                       26               False            False
## 31451                       26               False            False
## 42751                       26               False            False
## 42752                       26               False            False
##       GroupKey    DateCreditPulled CreditScoreRangeLower
## 13079          2013-12-23 09:38:12                   720
## 14889          2013-12-23 09:38:12                   720
## 20570          2013-12-23 09:38:12                   720
## 31451          2013-12-23 09:38:12                   720
## 42751          2013-12-23 09:38:12                   720
## 42752          2013-12-23 09:38:12                   720
##       CreditScoreRangeUpper FirstRecordedCreditLine CurrentCreditLines
## 13079                   739     1986-12-26 00:00:00                 12
## 14889                   739     1986-12-26 00:00:00                 12
## 20570                   739     1986-12-26 00:00:00                 12
## 31451                   739     1986-12-26 00:00:00                 12
## 42751                   739     1986-12-26 00:00:00                 12
## 42752                   739     1986-12-26 00:00:00                 12
##       OpenCreditLines TotalCreditLinespast7years OpenRevolvingAccounts
## 13079              12                         20                     6
## 14889              12                         20                     6
## 20570              12                         20                     6
## 31451              12                         20                     6
## 42751              12                         20                     6
## 42752              12                         20                     6
##       OpenRevolvingMonthlyPayment InquiriesLast6Months TotalInquiries
## 13079                         348                    0              5
## 14889                         348                    0              5
## 20570                         348                    0              5
## 31451                         348                    0              5
## 42751                         348                    0              5
## 42752                         348                    0              5
##       CurrentDelinquencies AmountDelinquent DelinquenciesLast7Years
## 13079                    0                0                       0
## 14889                    0                0                       0
## 20570                    0                0                       0
## 31451                    0                0                       0
## 42751                    0                0                       0
## 42752                    0                0                       0
##       PublicRecordsLast10Years PublicRecordsLast12Months
## 13079                        0                         0
## 14889                        0                         0
## 20570                        0                         0
## 31451                        0                         0
## 42751                        0                         0
## 42752                        0                         0
##       RevolvingCreditBalance BankcardUtilization AvailableBankcardCredit
## 13079                  14635                0.57                   10865
## 14889                  14635                0.57                   10865
## 20570                  14635                0.57                   10865
## 31451                  14635                0.57                   10865
## 42751                  14635                0.57                   10865
## 42752                  14635                0.57                   10865
##       TotalTrades TradesNeverDelinquent..percentage.
## 13079          17                                  1
## 14889          17                                  1
## 20570          17                                  1
## 31451          17                                  1
## 42751          17                                  1
## 42752          17                                  1
##       TradesOpenedLast6Months DebtToIncomeRatio    IncomeRange
## 13079                       0              0.41 $25,000-49,999
## 14889                       0              0.41 $25,000-49,999
## 20570                       0              0.41 $25,000-49,999
## 31451                       0              0.41 $25,000-49,999
## 42751                       0              0.41 $25,000-49,999
## 42752                       0              0.41 $25,000-49,999
##       IncomeVerifiable StatedMonthlyIncome                 LoanKey
## 13079             True                3000 CB1B37030986463208432A1
## 14889             True                3000 CB1B37030986463208432A1
## 20570             True                3000 CB1B37030986463208432A1
## 31451             True                3000 CB1B37030986463208432A1
## 42751             True                3000 CB1B37030986463208432A1
## 42752             True                3000 CB1B37030986463208432A1
##       TotalProsperLoans TotalProsperPaymentsBilled OnTimeProsperPayments
## 13079                NA                         NA                    NA
## 14889                NA                         NA                    NA
## 20570                NA                         NA                    NA
## 31451                NA                         NA                    NA
## 42751                NA                         NA                    NA
## 42752                NA                         NA                    NA
##       ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
## 13079                                  NA                              NA
## 14889                                  NA                              NA
## 20570                                  NA                              NA
## 31451                                  NA                              NA
## 42751                                  NA                              NA
## 42752                                  NA                              NA
##       ProsperPrincipalBorrowed ProsperPrincipalOutstanding
## 13079                       NA                          NA
## 14889                       NA                          NA
## 20570                       NA                          NA
## 31451                       NA                          NA
## 42751                       NA                          NA
## 42752                       NA                          NA
##       ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
## 13079                          NA                         0
## 14889                          NA                         0
## 20570                          NA                         0
## 31451                          NA                         0
## 42751                          NA                         0
## 42752                          NA                         0
##       LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination LoanNumber
## 13079                            NA                          2     126059
## 14889                            NA                          2     126059
## 20570                            NA                          2     126059
## 31451                            NA                          2     126059
## 42751                            NA                          2     126059
## 42752                            NA                          2     126059
##       LoanOriginalAmount LoanOriginationDate LoanOriginationQuarter
## 13079              10000 2014-01-13 00:00:00                Q1 2014
## 14889              10000 2014-01-13 00:00:00                Q1 2014
## 20570              10000 2014-01-13 00:00:00                Q1 2014
## 31451              10000 2014-01-13 00:00:00                Q1 2014
## 42751              10000 2014-01-13 00:00:00                Q1 2014
## 42752              10000 2014-01-13 00:00:00                Q1 2014
##                     MemberKey MonthlyLoanPayment LP_CustomerPayments
## 13079 F80D3694083622957BA09F2              234.5               234.5
## 14889 F80D3694083622957BA09F2              234.5               234.5
## 20570 F80D3694083622957BA09F2              234.5               234.5
## 31451 F80D3694083622957BA09F2              234.5               234.5
## 42751 F80D3694083622957BA09F2              234.5               234.5
## 42752 F80D3694083622957BA09F2              234.5               234.5
##       LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees
## 13079                       112.62             121.88          -8.49
## 14889                       112.62             121.88          -8.49
## 20570                       112.62             121.88          -8.49
## 31451                       112.62             121.88          -8.49
## 42751                       112.62             121.88          -8.49
## 42752                       112.62             121.88          -8.49
##       LP_CollectionFees LP_GrossPrincipalLoss LP_NetPrincipalLoss
## 13079                 0                     0                   0
## 14889                 0                     0                   0
## 20570                 0                     0                   0
## 31451                 0                     0                   0
## 42751                 0                     0                   0
## 42752                 0                     0                   0
##       LP_NonPrincipalRecoverypayments PercentFunded Recommendations
## 13079                               0             1               0
## 14889                               0             1               0
## 20570                               0             1               0
## 31451                               0             1               0
## 42751                               0             1               0
## 42752                               0             1               0
##       InvestmentFromFriendsCount InvestmentFromFriendsAmount Investors
## 13079                          0                           0        96
## 14889                          0                           0        96
## 20570                          0                           0        96
## 31451                          0                           0        96
## 42751                          0                           0        96
## 42752                          0                           0        96

The only difference is ProsperScore, how will cause the ProsperScore to change? not understood. So, for each loan data, if prosperScore changes, will be saved several times?

Univariate Plots Section

The main objective for this article is to find what factors will impact the BorrowerAPR, so we want to know the BorrowerAPR distribution firstly.

BorrowerAPR

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## 0.00653 0.15629 0.20976 0.21883 0.28381 0.51229      25

The most frequent BorrowerAPR should still be around 0.2. Another peak is arounk 0.36

Then we want to explore each variable one by one

CreditGrade

##          AA     A     B     C     D     E  NA's 
## 84984  3509  3315  4389  5649  5153  3289  3649

C Credit Grade is with big probabiltiy, lots of loan records have no CreditGrade information, this is reasonable, because CreditGrade is used for assessing the loan before 2009 July. After 2009 July, we will use ProsperRating for each loan. As we know, CreditGrade or ProsperRating should be one import factor that impacts the APR. There’s no “HR” level in CreditGrade, is ‘NA’ ‘HR’ level?

Something wrong here, NA value should be ‘HR’, change NA to HR.

##          AA     A     B     C     D     E    HR 
## 84984  3509  3315  4389  5649  5153  3289  3649

Term

##    12    36    60 
##  1614 87778 24545

Doubt that may Term 12 has been canceled in the latest prosper loan, however, from the data, the creation time is not old, so this thought is wrong.

There are just three values for Term variable, the most frequent one is 36, three years.

LoanStatus

## 
##              Cancelled             Chargedoff              Completed 
##                      5                  11992                  38074 
##                Current              Defaulted FinalPaymentInProgress 
##                  56576                   5018                    205 
##   Past Due (>120 days)   Past Due (1-15 days)  Past Due (16-30 days) 
##                     16                    806                    265 
##  Past Due (31-60 days)  Past Due (61-90 days) Past Due (91-120 days) 
##                    363                    313                    304

This LoanStatus is not the feature we care about for BorrowerAPR prediction, however, this one may can be used for predicting what kind of loan will be charged-off. This status WOW me, the probabiltiy for defaulted and charged-off is not small.

BorrowerRate

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1340  0.1840  0.1928  0.2500  0.4975

This feature is highly related with BorrowerAPR, BorrowerAPR = BorrowerRate + OrganizationFee. Will check whether organizationFee changes with Credit Grade or not, from the introduction is Prosper company, seems yes.

ProsperRating..numeric.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   1.000   3.000   4.000   4.072   5.000   7.000   29084
## 
##     1     2     3     4     5     6     7 
##  6935  9795 14274 18345 15581 14551  5372

Numerice and alpha value describe the same thing, so can just keep one

ProsperRating..Alpha.

##          AA     A     B     C     D     E  NA's 
## 29084  5372 14551 15581 18345 14274  9795  6935

Something wrong here, NA value should be ‘HR’, change NA to HR.

##          AA     A     B     C     D     E    HR 
## 29084  5372 14551 15581 18345 14274  9795  6935

As we talked before, combine CreditGrade and ProsperRating two columns to one column CreditRating that can describe the credit value.

Create CreditRating

##           A    AA     B     C     D     E    HR 
##   131 17866  8881 19970 23994 19427 13084 10584

Still have 131 loans that are with no CreditRating information.

##          AA     A     B     C     D     E    HR 
##   131  8881 17866 19970 23994 19427 13084 10584

All the credit information is combined. best -> worst, ‘AA’ -> ‘HR’.

ProsperScore

##     1     2     3     4     5     6     7     8     9    10    11  NA's 
##   992  5766  7642 12595  9813 12278 10597 12053  6911  4750  1456 29084

why there’s 11? in data decription file, 10 should be the highest value. What ever, best -> worst, 11 -> 1.

ListingCategory..numeric.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.000   1.000   2.774   3.000  20.000

The biggest probability is used for Debt consolication.

BorrowerState

##          AK    AL    AR    AZ    CA    CO    CT    DC    DE    FL    GA 
##  5515   200  1679   855  1901 14717  2210  1627   382   300  6720  5008 
##    HI    IA    ID    IL    IN    KS    KY    LA    MA    MD    ME    MI 
##   409   186   599  5921  2078  1062   983   954  2242  2821   101  3593 
##    MN    MO    MS    MT    NC    ND    NE    NH    NJ    NM    NV    NY 
##  2318  2615   787   330  3084    52   674   551  3097   472  1090  6729 
##    OH    OK    OR    PA    RI    SC    SD    TN    TX    UT    VA    VT 
##  4197   971  1817  2972   435  1122   189  1737  6842   877  3278   207 
##    WA    WI    WV    WY 
##  3048  1842   391   150

Did not understand the meaning of the two letter abbreviation.

Occupation

##                                                        Accountant/CPA 
##                               3588                               3233 
##           Administrative Assistant                            Analyst 
##                               3688                               3602 
##                          Architect                           Attorney 
##                                213                               1046 
##                          Biologist                         Bus Driver 
##                                125                                316 
##                         Car Dealer                            Chemist 
##                                180                                145 
##                      Civil Service                             Clergy 
##                               1457                                196 
##                           Clerical                Computer Programmer 
##                               3164                               4478 
##                       Construction                            Dentist 
##                               1790                                 68 
##                             Doctor                Engineer - Chemical 
##                                494                                225 
##              Engineer - Electrical              Engineer - Mechanical 
##                               1125                               1406 
##                          Executive                            Fireman 
##                               4311                                422 
##                   Flight Attendant                       Food Service 
##                                123                               1123 
##            Food Service Management                          Homemaker 
##                               1239                                120 
##                           Investor                              Judge 
##                                214                                 22 
##                            Laborer                        Landscaping 
##                               1595                                236 
##                 Medical Technician                  Military Enlisted 
##                               1117                               1272 
##                   Military Officer                        Nurse (LPN) 
##                                346                                492 
##                         Nurse (RN)                       Nurse's Aide 
##                               2489                                491 
##                              Other                         Pharmacist 
##                              28617                                257 
##         Pilot - Private/Commercial  Police Officer/Correction Officer 
##                                199                               1578 
##                     Postal Service                          Principal 
##                                627                                312 
##                       Professional                          Professor 
##                              13628                                557 
##                       Psychologist                            Realtor 
##                                145                                543 
##                          Religious                  Retail Management 
##                                124                               2602 
##                 Sales - Commission                     Sales - Retail 
##                               3446                               2797 
##                          Scientist                      Skilled Labor 
##                                372                               2746 
##                      Social Worker         Student - College Freshman 
##                                741                                 41 
## Student - College Graduate Student           Student - College Junior 
##                                245                                112 
##           Student - College Senior        Student - College Sophomore 
##                                188                                 69 
##        Student - Community College         Student - Technical School 
##                                 28                                 16 
##                            Teacher                     Teacher's Aide 
##                               3759                                276 
##              Tradesman - Carpenter            Tradesman - Electrician 
##                                120                                477 
##               Tradesman - Mechanic                Tradesman - Plumber 
##                                951                                102 
##                       Truck Driver                    Waiter/Waitress 
##                               1675                                436

The occupation should not be one key feature.

EmploymentStatus

##                    Employed     Full-time Not available  Not employed 
##          2255         67322         26355          5347           835 
##         Other     Part-time       Retired Self-employed 
##          3806          1088           795          6134

Want to combine the levels to just two, employed and not employed.

Create EmploymentFlag

##     Employed Not employed 
##       113102          835

Most borrowers are employed.

EmploymentStatusDuration

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00   26.00   67.00   96.07  137.00  755.00    7625

IsBorrowerHomeowner

## False  True 
## 56459 57478

False and True are nearly 50% and 50%.

CreditScoreRangeLower and Upper should be combined into one range column, like income range.

Create CreditScoreRange

##    [0-19] [360-379] [420-439] [440-459] [460-479] [480-499] [500-519] 
##       133         1         5        36       141       346       554 
## [520-539] [540-559] [560-579] [580-599] [600-619] [620-639] [640-659] 
##      1593      1474      1357      1125      3602      4172     12199 
## [660-679] [680-699] [700-719] [720-739] [740-759] [760-779] [780-799] 
##     16366     16492     15471     12923      9267      6606      4624 
## [800-819] [820-839] [840-859] [860-879] [880-899]      NA's 
##      2644      1409       567       212        27       591

Most Borrowers credit score in range 640 - 740. The uppper value = lower value + 19, so we can just keep one for the next revision.

Try to build a new feature to reduce the CreditScoreRange level so check whether will improve the relationship

Create CreditScoreRevision

Add one variable to judge the length of credit history, the longer history, should the lower APR.

Create LengthHistory

## Don't know how to automatically pick scale for object of type difftime. Defaulting to continuous.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The unit is day.

OpenRevolvingAccounts

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    4.00    6.00    6.97    9.00   51.00
## 
##     0     1     2     3     4     5     6     7     8     9    10    11 
##  3506  4989  7557  9901 11315 11928 11545 10220  8705  7317  5875  4696 
##    12    13    14    15    16    17    18    19    20    21    22    23 
##  3678  2875  2277  1775  1297  1000   760   630   470   360   278   196 
##    24    25    26    27    28    29    30    31    32    33    34    35 
##   185   126   103    80    57    58    42    29    26    12    14    12 
##    36    37    38    39    40    41    44    46    47    49    50    51 
##    10     5     6     5     4     5     1     2     2     1     1     1

InquiriesLast6Months

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.000   1.000   1.435   2.000 105.000     697
## 
##     0     1     2     3     4     5     6     7     8     9    10    11 
## 50005 28621 14432  7697  4297  2610  1664  1014   696   508   372   275 
##    12    13    14    15    16    17    18    19    20    21    22    23 
##   207   163   128    98    79    64    53    33    30    40    22    18 
##    24    25    26    27    28    29    30    31    32    33    34    35 
##    16    14    14     8     8     6     4    10     4     2     4     4 
##    36    37    38    40    41    42    44    46    50    52    53    63 
##     1     3     2     3     1     1     2     1     1     1     1     1 
##    97   105 
##     1     1

TotalInquiries

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   2.000   4.000   5.584   7.000 379.000    1159
## 
##     0     1     2     3     4     5     6     7     8     9    10    11 
##  8430 13785 14887 13934 12148 10098  7607  6171  4692  3779  2914  2431 
##    12    13    14    15    16    17    18    19    20    21    22    23 
##  1786  1453  1245   978   864   724   581   539   428   372   347   301 
##    24    25    26    27    28    29    30    31    32    33    34    35 
##   231   205   198   176   146   113   120    90   104    83    58    69 
##    36    37    38    39    40    41    42    43    44    45    46    47 
##    65    50    51    42    38    32    32    27    29    20    32    25 
##    48    49    50    51    52    53    54    55    56    57    58    59 
##    16    15    15    14    14    10     9     9    11     5    12     3 
##    60    61    62    63    64    65    66    67    68    69    70    71 
##     4     9     8     7     6     6     7     4     5     1     6     5 
##    72    74    75    76    77    78    79    80    82    83    85    86 
##     1     4     1     1     2     3     2     1     2     1     3     1 
##    87    88    89    90    93    95    96    97   103   105   106   109 
##     2     1     1     3     2     1     2     2     1     1     1     2 
##   112   113   117   158   377   379 
##     1     1     1     1     1     1

CurrentDelinquencies

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.0000  0.0000  0.0000  0.5921  0.0000 83.0000     697
## 
##     0     1     2     3     4     5     6     7     8     9    10    11 
## 89742 11716  4357  2098  1379   916   690   517   397   289   212   191 
##    12    13    14    15    16    17    18    19    20    21    22    23 
##   147   111    71    83    58    40    37    28    27    31    21     9 
##    24    25    26    27    28    30    31    32    33    35    36    37 
##    12     5     8    12     5     2     6     5     1     2     2     1 
##    39    40    41    45    50    51    57    59    64    82    83 
##     1     1     2     1     1     1     1     1     1     1     1

AmountDelinquent

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
##      0.0      0.0      0.0    984.5      0.0 463881.0     7622
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

DelinquenciesLast7Years

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.000   0.000   4.155   3.000  99.000     990
## 
##     0     1     2     3     4     5     6     7     8     9    10    11 
## 76439  3967  2879  3183  2592  1826  1790  1648  1421  1208  1151  1075 
##    12    13    14    15    16    17    18    19    20    21    22    23 
##   982   873   821   795   731   608   574   540   565   472   421   439 
##    24    25    26    27    28    29    30    31    32    33    34    35 
##   423   347   330   317   296   287   248   214   225   190   190   201 
##    36    37    38    39    40    41    42    43    44    45    46    47 
##   147   153   144   148   113   106   128   101   110    81    90    94 
##    48    49    50    51    52    53    54    55    56    57    58    59 
##    78    74    72    72    55    40    40    39    53    30    31    34 
##    60    61    62    63    64    65    66    67    68    69    70    71 
##    41    34    36    31    28    34    27    22    20    20    15    13 
##    72    73    74    75    76    77    78    79    80    81    82    83 
##    14    17     9    22    10    15    10     8    12     4    12     6 
##    84    85    86    87    88    89    90    91    92    93    94    95 
##     8     3     7     7     9     5     7     4     6     2     3     4 
##    96    97    98    99 
##     4     4     3   110

PublicRecordsLast10Years

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.0000  0.0000  0.0000  0.3126  0.0000 38.0000     697
## 
##     0     1     2     3     4     5     6     7     8     9    10    11 
## 85803 22834  3011   894   345   151    70    46    31    15     8     7 
##    12    13    14    15    16    17    20    21    22    25    30    34 
##     4     1     4     3     5     1     1     1     1     1     1     1 
##    38 
##     1

PublicRecordsLast12Months

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.000   0.000   0.015   0.000  20.000    7604
## 
##      0      1      2      3      4      7     20 
## 104941   1255     96     28     10      2      1

RevolvingCreditBalance

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##       0    3121    8549   17599   19521 1435667    7604

BankcardUtilization

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.310   0.600   0.561   0.840   5.950    7604

AvailableBankcardCredit

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##       0     880    4100   11210   13180  646285    7544

TradesNeverDelinquent..percentage.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.820   0.940   0.886   1.000   1.000    7544

DebtToIncomeRatio

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.140   0.220   0.276   0.320  10.010    8554

IncomeRange

##             $0      $1-24,999      $100,000+ $25,000-49,999 $50,000-74,999 
##            621           7274          17337          32192          31050 
## $75,000-99,999  Not displayed   Not employed 
##          16916           7741            806

What’s the meaning of ‘not displayed’, what’s the difference between ‘not employed’ and ‘$0’?

IncomeVerifiable

##  False   True 
##   8669 105268

Save the revision data

Univariate Analysis

What is the structure of your dataset?

There are 113937 obs. of 81 variables in this dataset. Each observation is one loan record. The dataset is collected from Prosper webBank, who is America’s frist peer-to-peer lending marketplace. Borrowers request personal loans and inverstor fund. Knowing more background will make us understand the data more easily and clearly.

What is/are the main feature(s) of interest in your dataset?

Since this is one loan dataset, we care most is the BorrowerAPR. We want to known what factors will impact the BorrowerAPR and build one prediction model. The basic one should be CreditRating, however, some combination of other variables should also be used to build the prediction model.

What other features in the dataset do you think will help support your

investigation into your feature(s) of interest?

In my view, the features may imapct the BorrowerAPR includes at least: ProsperScore, EmploymentStatus, EmploymentFlag, IsBorrowerHomeowener, CreditScoreRange, OpenRevolvingAccounts, InquireisLast6Months, CurrentDelinquencies, AmountDelinquent, DelinquenciesLast7Years, PublicRecordsLast10Years, PublicRecordsLast12Months, BankcardUtilization, AvailableBankcardCredit, TradesNeverDelinquent, DebtToIncomeRatio, IncomeRange, IncomeVerifiable, LengthHistory etc.

After lots of research, find that the five important components for credit score, i.e., payment history, credit utilization, length of credit history, new credit and credit mix. The features in data set almost all have relationship with the five components. However, except CreditRating, ProsperScore, CreditScoreRange, which already take acount of the whole history information, Delinquency part, PublicRecords part, BankcardUtilization, AvailableBankcardCredit, DebtToIncomeRatio, LengthHistory should be the most related features.

Did you create any new variables from existing variables in the dataset?

Since CreditGrade is used for period before July, 2009 and ProsperRating is used fo period after July, 2009, i combine them to one CreditRating feature.

Create a variable EmploymentFlag to classify the emopolyment status to employed and not employed, will check more in the Bivariate Plots Section.

Create a variable CreditScoreRange to describe CrediteScoreRangeLower and Upper more directly. And then create a variable CreditRevision to reduce the levels.

Create a variable LengthHistory to judge the length of credit history.

Of the features you investigated, were there any unusual distributions?

Did you perform any operations on the data to tidy, adjust, or change the form
of the data? If so, why did you do this?

GreditGrade and ProsperRating miss ‘HR’ level, change the NA value to ‘HR’, since ProsperRating(numeric) use 1 for ‘HR’.

Change Term, ProsperScore to factor, since just limited levels.

Change FirstRecordedCreditLine and ListingCreationDate to Datetime

Bivariate Plots Section

Select the variables we most care about to get the subset to calculate the relationship between each other

Correlation matrix

##               BA           BR          CR            T           PS
## BA   1.000000000  0.989823970  0.87189714 -0.011183469 -0.668287197
## BR   0.989823970  1.000000000  0.87562059  0.020085366 -0.649736144
## CR   0.871897136  0.875620587  1.00000000 -0.070009142 -0.705221420
## T   -0.011183469  0.020085366 -0.07000914  1.000000000  0.028946520
## PS  -0.668287197 -0.649736144 -0.70522142  0.028946520  1.000000000
## LC   0.132455835  0.102913488  0.05017470  0.004947144 -0.009717877
## EF   0.060873776  0.058930969  0.04963288 -0.018943759 -0.023713085
## IB  -0.132822618 -0.134430562 -0.19500074  0.085339314  0.064437777
## CS  -0.451327562 -0.483434065 -0.65953775  0.129818307  0.369603047
## TC   0.002513417 -0.005793111 -0.05396318  0.076527775 -0.037251547
## IL   0.146119308  0.183810023  0.21623650 -0.113568019 -0.296761859
## TI   0.114546407  0.153128748  0.18666351 -0.103132404 -0.215766181
## CD   0.149403936  0.176530084  0.24818167 -0.083807367 -0.100612243
## AD   0.065679195  0.065644674  0.06419472 -0.016458723 -0.041600592
## DL   0.162225391  0.170278704  0.19500056 -0.041492379 -0.097754738
## PR   0.044094991  0.051168539  0.05445116 -0.026251691 -0.014884521
## BU   0.261438040  0.255482029  0.29675023  0.031535353 -0.244695570
## AB  -0.348926135 -0.343861143 -0.38814023  0.015347737  0.318558038
## TT  -0.041893875 -0.048210682 -0.08578411  0.079650347 -0.011130754
## TN  -0.241348883 -0.261189459 -0.31617912  0.119341932  0.129893694
## DI   0.056327417  0.062916780  0.05011206 -0.014670053 -0.145335892
## IR  -0.055260019 -0.033849986  0.02670477 -0.015797041  0.020201444
## IV  -0.109974504 -0.099540159 -0.06735857  0.040402465  0.154618878
## LH  -0.028707233 -0.052822826 -0.11367644  0.103460569  0.021765547
## CSR -0.478930786 -0.494403160 -0.61788683  0.101562235  0.358797768
##               LC            EF            IB          CS           TC
## BA   0.132455835  6.087378e-02 -0.1328226179 -0.45132756  0.002513417
## BR   0.102913488  5.893097e-02 -0.1344305618 -0.48343406 -0.005793111
## CR   0.050174703  4.963288e-02 -0.1950007397 -0.65953775 -0.053963183
## T    0.004947144 -1.894376e-02  0.0853393140  0.12981831  0.076527775
## PS  -0.009717877 -2.371309e-02  0.0644377769  0.36960305 -0.037251547
## LC   1.000000000  3.193896e-02 -0.0382242873  0.10301591 -0.038400838
## EF   0.031938962  1.000000e+00 -0.0430626838  0.01024205 -0.041491813
## IB  -0.038224287 -4.306268e-02  1.0000000000  0.30235721  0.293586654
## CS   0.103015914  1.024205e-02  0.3023572068  1.00000000  0.105728412
## TC  -0.038400838 -4.149181e-02  0.2935866539  0.10572841  1.000000000
## IL  -0.072643941 -2.007191e-02  0.0068929738 -0.26938598  0.072628893
## TI  -0.091324751 -3.238113e-02  0.0665960662 -0.29323289  0.168401756
## CD  -0.049935645 -7.477945e-03 -0.0554537080 -0.37388191  0.067600665
## AD   0.022202378 -2.616979e-03  0.0381222589 -0.06584937  0.050983528
## DL   0.016949523 -1.314943e-02 -0.0707983959 -0.25983065  0.146574191
## PR   0.003167448  2.847180e-04 -0.0150161212 -0.08344302  0.005523972
## BU  -0.087582952 -2.507352e-02  0.0866408510 -0.40533805  0.100924234
## AB  -0.031516945 -8.422820e-03  0.1420390221  0.45369414  0.194360344
## TT  -0.065431345 -5.303084e-02  0.3174057629  0.14002810  0.936482440
## TN  -0.022489056  8.756344e-05  0.1371216760  0.46866893  0.044293815
## DI  -0.042754149  1.521460e-01  0.0001774271 -0.01499532  0.037486139
## IR  -0.098098851  2.123003e-01  0.0266316683 -0.12687275  0.038153883
## IV  -0.043461048 -2.551694e-01  0.0641075614 -0.06329716  0.052501724
## LH   0.028583154 -1.779636e-02  0.2007114714  0.22600401  0.367817571
## CSR  0.063586870  1.007847e-02  0.2949463342  0.91878992  0.074654387
##               IL          TI            CD           AD          DL
## BA   0.146119308  0.11454641  0.1494039356  0.065679195  0.16222539
## BR   0.183810023  0.15312875  0.1765300836  0.065644674  0.17027870
## CR   0.216236497  0.18666351  0.2481816718  0.064194718  0.19500056
## T   -0.113568019 -0.10313240 -0.0838073668 -0.016458723 -0.04149238
## PS  -0.296761859 -0.21576618 -0.1006122425 -0.041600592 -0.09775474
## LC  -0.072643941 -0.09132475 -0.0499356450  0.022202378  0.01694952
## EF  -0.020071906 -0.03238113 -0.0074779453 -0.002616979 -0.01314943
## IB   0.006892974  0.06659607 -0.0554537080  0.038122259 -0.07079840
## CS  -0.269385980 -0.29323289 -0.3738819082 -0.065849371 -0.25983065
## TC   0.072628893  0.16840176  0.0676006646  0.050983528  0.14657419
## IL   1.000000000  0.74194993  0.1563415408  0.023968896  0.09032870
## TI   0.741949925  1.00000000  0.1751286458  0.031494616  0.11135223
## CD   0.156341541  0.17512865  1.0000000000  0.340548522  0.37777692
## AD   0.023968896  0.03149462  0.3405485218  1.000000000  0.23327026
## DL   0.090328703  0.11135223  0.3777769220  0.233270261  1.00000000
## PR   0.048872572  0.05606020  0.1116605006  0.041158349  0.08385023
## BU  -0.032599094  0.01788041 -0.0437730700 -0.024321355 -0.02948350
## AB  -0.004564040 -0.01692251 -0.0924328516 -0.020285733 -0.13448634
## TT   0.075307472  0.17166533 -0.0002363269  0.031958692  0.09387175
## TN  -0.119889136 -0.12797625 -0.4587605913 -0.138624920 -0.51644318
## DI   0.024435906  0.02856406 -0.0242645963 -0.019397486 -0.04387671
## IR   0.100565579  0.10171298  0.1230138455  0.005495840  0.05991164
## IV   0.045767194  0.05053806  0.0429200919  0.008977402  0.04120704
## LH  -0.101054528 -0.10188945 -0.0218254427  0.042973961  0.08520953
## CSR -0.189599636 -0.22532376 -0.2345581052 -0.055521784 -0.23644920
##               PR          BU           AB            TT            TN
## BA   0.044094991  0.26143804 -0.348926135 -0.0418938748 -2.413489e-01
## BR   0.051168539  0.25548203 -0.343861143 -0.0482106823 -2.611895e-01
## CR   0.054451158  0.29675023 -0.388140234 -0.0857841144 -3.161791e-01
## T   -0.026251691  0.03153535  0.015347737  0.0796503467  1.193419e-01
## PS  -0.014884521 -0.24469557  0.318558038 -0.0111307542  1.298937e-01
## LC   0.003167448 -0.08758295 -0.031516945 -0.0654313455 -2.248906e-02
## EF   0.000284718 -0.02507352 -0.008422820 -0.0530308417  8.756344e-05
## IB  -0.015016121  0.08664085  0.142039022  0.3174057629  1.371217e-01
## CS  -0.083443019 -0.40533805  0.453694143  0.1400280960  4.686689e-01
## TC   0.005523972  0.10092423  0.194360344  0.9364824396  4.429381e-02
## IL   0.048872572 -0.03259909 -0.004564040  0.0753074717 -1.198891e-01
## TI   0.056060197  0.01788041 -0.016922506  0.1716653315 -1.279762e-01
## CD   0.111660501 -0.04377307 -0.092432852 -0.0002363269 -4.587606e-01
## AD   0.041158349 -0.02432135 -0.020285733  0.0319586921 -1.386249e-01
## DL   0.083850226 -0.02948350 -0.134486344  0.0938717472 -5.164432e-01
## PR   1.000000000 -0.02120356 -0.027696759 -0.0086041890 -1.146543e-01
## BU  -0.021203560  1.00000000 -0.350830600  0.1005900351  3.926058e-02
## AB  -0.027696759 -0.35083060  1.000000000  0.2499170866  2.384296e-01
## TT  -0.008604189  0.10059004  0.249917087  1.0000000000  1.220540e-01
## TN  -0.114654296  0.03926058  0.238429612  0.1220539905  1.000000e+00
## DI  -0.008150796  0.03559958  0.002058548  0.0393176800  5.463910e-02
## IR  -0.008929898  0.03497872 -0.005458496  0.0944064561  2.189603e-02
## IV  -0.005135338  0.04672484 -0.045080901  0.0584803707 -4.632649e-02
## LH   0.007866744  0.07993760  0.154917210  0.3981152488  6.083840e-03
## CSR -0.061944128 -0.41934418  0.471888435  0.1189872825  3.969301e-01
##                DI           IR           IV            LH         CSR
## BA   5.632742e-02 -0.055260019 -0.109974504 -2.870723e-02 -0.47893079
## BR   6.291678e-02 -0.033849986 -0.099540159 -5.282283e-02 -0.49440316
## CR   5.011206e-02  0.026704772 -0.067358572 -1.136764e-01 -0.61788683
## T   -1.467005e-02 -0.015797041  0.040402465  1.034606e-01  0.10156223
## PS  -1.453359e-01  0.020201444  0.154618878  2.176555e-02  0.35879777
## LC  -4.275415e-02 -0.098098851 -0.043461048  2.858315e-02  0.06358687
## EF   1.521460e-01  0.212300278 -0.255169351 -1.779636e-02  0.01007847
## IB   1.774271e-04  0.026631668  0.064107561  2.007115e-01  0.29494633
## CS  -1.499532e-02 -0.126872751 -0.063297158  2.260040e-01  0.91878992
## TC   3.748614e-02  0.038153883  0.052501724  3.678176e-01  0.07465439
## IL   2.443591e-02  0.100565579  0.045767194 -1.010545e-01 -0.18959964
## TI   2.856406e-02  0.101712983  0.050538058 -1.018894e-01 -0.22532376
## CD  -2.426460e-02  0.123013845  0.042920092 -2.182544e-02 -0.23455811
## AD  -1.939749e-02  0.005495840  0.008977402  4.297396e-02 -0.05552178
## DL  -4.387671e-02  0.059911645  0.041207043  8.520953e-02 -0.23644920
## PR  -8.150796e-03 -0.008929898 -0.005135338  7.866744e-03 -0.06194413
## BU   3.559958e-02  0.034978717  0.046724845  7.993760e-02 -0.41934418
## AB   2.058548e-03 -0.005458496 -0.045080901  1.549172e-01  0.47188844
## TT   3.931768e-02  0.094406456  0.058480371  3.981152e-01  0.11898728
## TN   5.463910e-02  0.021896030 -0.046326486  6.083840e-03  0.39693013
## DI   1.000000e+00 -0.077113701 -0.600516568 -6.130953e-05 -0.02037398
## IR  -7.711370e-02  1.000000000  0.068199507 -2.421596e-02 -0.06084069
## IV  -6.005166e-01  0.068199507  1.000000000 -1.148076e-02 -0.05411182
## LH  -6.130953e-05 -0.024215961 -0.011480757  1.000000e+00  0.18279835
## CSR -2.037398e-02 -0.060840692 -0.054111816  1.827983e-01  1.00000000

Select the relationship: Very Strong (0.8 - 1), Strong (0.6 - 0.8), Moderate (0.4 - 0.6), Weak (0.2 - 0.4)

##     Var1 Var2     value
## 1     BA   BA 1.0000000
## 2     BR   BA 0.9898240
## 3     CR   BA 0.8718971
## 26    BA   BR 0.9898240
## 27    BR   BR 1.0000000
## 28    CR   BR 0.8756206
## 51    BA   CR 0.8718971
## 52    BR   CR 0.8756206
## 53    CR   CR 1.0000000
## 79     T    T 1.0000000
## 105   PS   PS 1.0000000
## 131   LC   LC 1.0000000
## 157   EF   EF 1.0000000
## 183   IB   IB 1.0000000
## 209   CS   CS 1.0000000
## 225  CSR   CS 0.9187899
## 235   TC   TC 1.0000000
## 244   TT   TC 0.9364824
## 261   IL   IL 1.0000000
## 287   TI   TI 1.0000000
## 313   CD   CD 1.0000000
## 339   AD   AD 1.0000000
## 365   DL   DL 1.0000000
## 391   PR   PR 1.0000000
## 417   BU   BU 1.0000000
## 443   AB   AB 1.0000000
## 460   TC   TT 0.9364824
## 469   TT   TT 1.0000000
## 495   TN   TN 1.0000000
## 521   DI   DI 1.0000000
## 547   IR   IR 1.0000000
## 573   IV   IV 1.0000000
## 599   LH   LH 1.0000000
## 609   CS  CSR 0.9187899
## 625  CSR  CSR 1.0000000
##     Var1 Var2      value
## 5     PS   BA -0.6682872
## 30    PS   BR -0.6497361
## 55    PS   CR -0.7052214
## 59    CS   CR -0.6595377
## 75   CSR   CR -0.6178868
## 101   BA   PS -0.6682872
## 102   BR   PS -0.6497361
## 103   CR   PS -0.7052214
## 203   CR   CS -0.6595377
## 262   TI   IL  0.7419499
## 286   IL   TI  0.7419499
## 523   IV   DI -0.6005166
## 571   DI   IV -0.6005166
## 603   CR  CSR -0.6178868
##     Var1 Var2      value
## 9     CS   BA -0.4513276
## 25   CSR   BA -0.4789308
## 34    CS   BR -0.4834341
## 50   CSR   BR -0.4944032
## 201   BA   CS -0.4513276
## 202   BR   CS -0.4834341
## 217   BU   CS -0.4053380
## 218   AB   CS  0.4536941
## 220   TN   CS  0.4686689
## 320   TN   CD -0.4587606
## 370   TN   DL -0.5164432
## 409   CS   BU -0.4053380
## 425  CSR   BU -0.4193442
## 434   CS   AB  0.4536941
## 450  CSR   AB  0.4718884
## 484   CS   TN  0.4686689
## 488   CD   TN -0.4587606
## 490   DL   TN -0.5164432
## 601   BA  CSR -0.4789308
## 602   BR  CSR -0.4944032
## 617   BU  CSR -0.4193442
## 618   AB  CSR  0.4718884
##     Var1 Var2      value
## 17    BU   BA  0.2614380
## 18    AB   BA -0.3489261
## 20    TN   BA -0.2413489
## 42    BU   BR  0.2554820
## 43    AB   BR -0.3438611
## 45    TN   BR -0.2611895
## 61    IL   CR  0.2162365
## 63    CD   CR  0.2481817
## 67    BU   CR  0.2967502
## 68    AB   CR -0.3881402
## 70    TN   CR -0.3161791
## 109   CS   PS  0.3696030
## 111   IL   PS -0.2967619
## 112   TI   PS -0.2157662
## 117   BU   PS -0.2446956
## 118   AB   PS  0.3185580
## 125  CSR   PS  0.3587978
## 172   IR   EF  0.2123003
## 173   IV   EF -0.2551694
## 184   CS   IB  0.3023572
## 185   TC   IB  0.2935867
## 194   TT   IB  0.3174058
## 199   LH   IB  0.2007115
## 200  CSR   IB  0.2949463
## 205   PS   CS  0.3696030
## 208   IB   CS  0.3023572
## 211   IL   CS -0.2693860
## 212   TI   CS -0.2932329
## 213   CD   CS -0.3738819
## 215   DL   CS -0.2598307
## 224   LH   CS  0.2260040
## 233   IB   TC  0.2935867
## 249   LH   TC  0.3678176
## 253   CR   IL  0.2162365
## 255   PS   IL -0.2967619
## 259   CS   IL -0.2693860
## 280   PS   TI -0.2157662
## 284   CS   TI -0.2932329
## 300  CSR   TI -0.2253238
## 303   CR   CD  0.2481817
## 309   CS   CD -0.3738819
## 314   AD   CD  0.3405485
## 315   DL   CD  0.3777769
## 325  CSR   CD -0.2345581
## 338   CD   AD  0.3405485
## 340   DL   AD  0.2332703
## 359   CS   DL -0.2598307
## 363   CD   DL  0.3777769
## 364   AD   DL  0.2332703
## 375  CSR   DL -0.2364492
## 401   BA   BU  0.2614380
## 402   BR   BU  0.2554820
## 403   CR   BU  0.2967502
## 405   PS   BU -0.2446956
## 418   AB   BU -0.3508306
## 426   BA   AB -0.3489261
## 427   BR   AB -0.3438611
## 428   CR   AB -0.3881402
## 430   PS   AB  0.3185580
## 442   BU   AB -0.3508306
## 444   TT   AB  0.2499171
## 445   TN   AB  0.2384296
## 458   IB   TT  0.3174058
## 468   AB   TT  0.2499171
## 474   LH   TT  0.3981152
## 476   BA   TN -0.2413489
## 477   BR   TN -0.2611895
## 478   CR   TN -0.3161791
## 493   AB   TN  0.2384296
## 500  CSR   TN  0.3969301
## 532   EF   IR  0.2123003
## 557   EF   IV -0.2551694
## 583   IB   LH  0.2007115
## 584   CS   LH  0.2260040
## 585   TC   LH  0.3678176
## 594   TT   LH  0.3981152
## 605   PS  CSR  0.3587978
## 608   IB  CSR  0.2949463
## 612   TI  CSR -0.2253238
## 613   CD  CSR -0.2345581
## 615   DL  CSR -0.2364492
## 620   TN  CSR  0.3969301

From the correlation figure, we can see that there’s very strong relationship between variable BorrowerAPR and CreditRating, this meets our expectation, the higher CreditRating is, the lower BorrowerAPR should be.

Moreover, there’s strong relationship between variable BorrowerAPR and ProsperScore while meanwhile CreditRating also has strong relationship with ProsperScore.

BorrowerAPR has moderate relationship with CreditScoreRange while CreditRating also has moderate relationship with CreditScoreRange. Intesting, why just moderate?

BorrowerAPR has weak relationship with BankcardUtilization, AvaliableBankcardCredit and TradesNeverDelinquent. Meanwhile, they have weak relationship between each other. Moreover, they have moderate relationship with CreditScoreRange.

InquiriesLast6Months has strong relationship with TotalInquiries, reasonable.

Intesting, IncomeVefiable has strong relationship with DebtToIncomeRatio, why?

TradesNeverDelinquent has moderate relationship with CurrentDelinquencies, DelinquenciesLast7Years, which makes sense.

lots of weak relationship.

The created variables’ value is not that obvious. CreditScoreRevision imporves a little compared CreditScoreRange. Will use CreditScoreRevision for the following analysis.

have not found the variables that are related to BorrowerAPR while not related to CreditRating

One idea bingo, want to know whether the orgination fee changes with CreditRating, so we will build one variable OrginationFee and visulize it.

We can see that the orgination fee also changes with the CreditRationg level. The obvious diff is between level ‘AA’ and level “A”.

BorrowerAPR of CreditRating

The result meets our expectation as mentioned before, However, why there are lots of outliers and the variance is not small? it seems there are still other variables control the BorrowerAPR trendency, their influence can not be ignored.

Keep CreditRating fixed, chech the influence of ProsperScore.

BorrowerAPR of ProsperScore

As the correlation value calculated before, strong relationship between BorrowerAPR and ProsperScore.

Plot the relationship between ProsperScore and CreditRating. Confused, how to get the CreditRating? how to calculate the ProsperScore?

CreditRating vs ProsperScore

From the figure, we can see, part of ‘AA’ borrowers still have high risk score, e.g., 4. Therefore, we still should keep the ProsperScore feature, it can descirbe a different dimension for BorrowerAPR prediction.

BorrowerAPR of CreditScoreRange

##    [0-19] [360-379] [420-439] [440-459] [460-479] [480-499] [500-519] 
##       133         1         5        36       141       346       554 
## [520-539] [540-559] [560-579] [580-599] [600-619] [620-639] [640-659] 
##      1593      1474      1357      1125      3602      4172     12199 
## [660-679] [680-699] [700-719] [720-739] [740-759] [760-779] [780-799] 
##     16366     16492     15471     12923      9267      6606      4624 
## [800-819] [820-839] [840-859] [860-879] [880-899]      NA's 
##      2644      1409       567       212        27       591

The sample number in both tail side is not enough, e.g., [0 - 440]. The score is got from customer credit rating agency, still that quesion, how to get credit rating?

CreditRating vs CreditScoreRange

From the figure, we can see CreditRating ‘AA’ borrowers may have a low CreditScore, why? Whatever, CreditScoreRange is still one important feature for the prediction.

Try to reduce the levels for CreditScoreRange to check whether can imporve the correlation value by this way.

BorrowerAPR of CreditScoreRevision

##   (0,640] (640,680] (680,720] (720,760] (760,800] (800,840] (840,880] 
##     26605     32858     28394     15873      7268      1976       239 
##      NA's 
##       724

Emmm, more clear than CreditScoreRange.

BorrowerAPR of BankcardUtilization

If understand correctly, this BankcardUtilization should mean ratio of your credit card balances to credit limits. The higher the BankcardUtilization, the higher BorrowerAPR, because high BankcardUtilization will make lender to think that there’s an increased risk.

BorrowerAPR of BankcardUtilization

Bases on before correlation calculation output, we know AvailableBankcardCredit should have weak relationship with BorrowerAPR, this figure shows this. With present knowledge, credit limits = credit balance + credit pending transaction + avaliable credit. This feature can show the credit limits inderectly.

BorrowerAPR of TradesNeverDelinquent..percentage.

The higher radesNeverDelinquent..percentage., the lower BorrowerAPR.

BorrowerAPR of EmploymentStatus

BorrowerAPR of EmploymentFlag

This should be clear now, Employed may get a low Borrower APR, but why the correlation value is low between BorrowerAPR and EmployedFlag? If we check more carefully, the tail below 25% seems long.

BorrowerAPR of CurrentDelinquencies

BorrowerAPR of AmountDelinquent

BorrowerAPR of IsBorrowerHomeowner

BorrowerAPR of InquiriesLast6Months

BorrowerAPR of DelinquenciesLast7Years

BorrowerAPR of TotalTrades

BorrowerAPR of DebtToIncomeRatio

BorrowerAPR of OpenRevolvingAccounts

BorrowerAPR of IncomeRange

0 not reasonable, samples are not enough. What’s the difference between ‘$0’ and ‘not employed’

BorrowerAPR of IncomeVerifiable

BorrowerAPR of PublicRecordsLast10Years

BorrowerAPR of ProsperPaymentsOneMonthPlusLate

BorrowerAPR of TotalProsperLoans

IncomeVerifiable vs DebtToIncomeRatio

50% DebtToincome is 10.1 when IncomeVerifiable is false, this is the max value for DebtToincome, what’s the meaning of 10.1? it seems that the strong relationship between DebtToIncomRatio and IncomeVerifiable is meanless and not useful for our objective.

Bivariate Analysis

Talk about some of the relationships you observed in this part of the

investigation. How did the feature(s) of interest vary with other features in
the dataset?

There’s very strong relationship between variable BorrowerAPR and CreditRating, this meets our expectation, the higher CreditRating is, the lower BorrowerAPR should be.

Moreover, there’s strong relationship between variable BorrowerAPR and ProsperScore while meanwhile CreditRating also has strong relationship with ProsperScore. Moreover, from the bar plot, part of ‘AA’ borrowers still have high risk score, e.g., 4. Therefore, we still should keep the ProsperScore feature, it can descirbe a different dimension for BorrowerAPR prediction.

BorrowerAPR has moderate relationship with CreditScoreRange while CreditRating also has moderate relationship with CreditScoreRange. Moreover, we can see CreditRating ‘AA’ borrowers may have a low CreditScore, why? Whatever, CreditScoreRange is still one feature for the prediction.

BorrowerAPR has weak relationship with BankcardUtilization, AvaliableBankcardCredit and TradesNeverDelinquent. Meanwhile, they have weak relationship between each other. Moreover, they have moderate relationship with CreditScoreRange.

What confuses me is that, where we get the CreditRating, ProsperScore? All these features should combine the important credit information, like, payment history, credit utilization, length of creit history, new credit, credit mix etc, all of them are history feature.

Did you observe any interesting relationships between the other features

(not the main feature(s) of interest)?

InquiriesLast6Months has strong relationship with TotalInquiries, reasonable. TradesNeverDelinquent has moderate relationship with CurrentDelinquencies, DelinquenciesLast7Years, which makes sense. All these prove that the history can predict the present status.

IncomeVerifiable has strong relationship with DebtToIncomeRatio, then find that when IncomeVeriiable is False, 50% DebtToIncomeRatio is max value 10.1, do not know the meaning of this value, this relathionship should be not useful for our objective.

What was the strongest relationship you found?

BorrowerAPR and CreditRating

Multivariate Plots Section

BorrowerAPR vs CreditRating vs ProsperScore

BorrowerAPR vs CreditRating vs CreditScoreRevision

BorrowerAPR vs CreditRating vs BankcardUtilization

We know there’s weak relationship between CreditRating and BankcardUtilization, the figure proves that, the most obvious one is that ‘AA’ borrowers tend to have small BankcardUtilization, ‘HR’ and ‘E’ borrowers tend to have big BankcardUtilization.

BorrowerAPR vs CreditRating vs TradesNeverDelinquent..percentage.

From the figure, we can see the weak relationship between TradesNeverDelinquent..percentage. and CreditRating, more ‘HR’ borrowers have small TradesNeverDelinquent..percentage. compared with ‘AA’ borrower. Moreover, we can see that if keep the creditRating fixed, smaller TradesNeverDelinquent..percentage tends to have bigger BorrowerAPR.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the

investigation. Were there features that strengthened each other in terms of
looking at your feature(s) of interest?

BorrowerAPR has very strong relationship with CreditRating, strong relationship with ProsperScore, moderate relationship with CreditScoreRevision, weak relationship with BankcardUtilization, AvaliableBankcardCredit and TradesNeverDelinquent. Moreover, all these features have strong, moderate, weak relationship with CreditRating. However, if we dig deeper, can find each of them can describe a different dimension view of data. Where, wonder how to get the CreditRating value? calculated with history data, which includes payment history, credit utilization, length of creit history, new credit, credit mix etc? how to get the ProsperScore? how to get CreditScoreRange? what’s their difference?

Were there any interesting or surprising interactions between features?

lots of features have strong and moderate relationship with CreditRating. If all these features, i.e., CreditRating, ProsperScore, CreditScoreRevision, take account of history data, why they are different?, why we need them all?

OPTIONAL: Did you create any models with your dataset? Discuss the

strengths and limitations of your model.

Want to build a model to predict the BorrowerAPR, will do this in future, can use Machine Learning Algrithms.

Final Plots and Summary

Plot One

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## 0.00653 0.15629 0.20976 0.21883 0.28381 0.51229      25

Description One

From this figure, we can see several peaks, the most frequent one is around 0.2 and another bigger peak is around 0.36

Plot Two

Description Two

The higher CreditRating is, the lower BorrowerAPR will be. However, the variance for each CreditRating level is not small, There are still some other imported features to control the finnal BorrowerAPR. From the trendency, it seems that one linear model can be built to predict the BorrowerAPR.

Plot Three

Description Three

From the figure, we can see the weak relationship between TradesNeverDelinquent..percentage. and CreditRating, more ‘HR’ borrowers have small TradesNeverDelinquent..percentage. compared with ‘AA’ borrower. Moreover, we can see that if keep the creditRating fixed, smaller TradesNeverDelinquent..percentage tends to have bigger BorrowerAPR.


Reflection

This is one loan data set, it takes me huge time to understand each variable, which includes understand the variables based on excel data description, search the related knowledge of credit loan and Prosper WebBank etc.

Then i try to explore the data by Univariate Plot and Bivariate Plot, begin to know what i can find based on the data. BorrowerAPR, yes, that’s what borrower and lender most care about. For borrowers, they want to know how can reduce the Borrower APR; For lenders, they want to know, what kind of loan will help them make lots of money and minimize their loss. I find lots of information in Bivariate Plot part. When the correlation matrix is calculated, i compare the value with the plot and finally totally understand what’s going on here.

In order to predict the BorrowerAPR better, i create some new variables, e.g, CreditRating, CreditScoreRange, CreditScoreRevision, EmploymentFlag, LengthHistory. However, the influence is not that obvious, just the relationship between BorrowerAPR and CreditScoreRevision improves a little compared with the relationship between BorrowerAPR and CreditScoreRange.

BorrowerAPR has very strong relationship with CreditRating, strong relationship with ProsperScore, moderate relationship with CreditScoreRevision, weak relationship with BankcardUtilization, AvaliableBankcardCredit and TradesNeverDelinquent. Where, CreditRating has strong relationship with ProsperScore, moderate relationship with CreditScoreRevision. However, ProsperScore and CreditScoreRevision can describe the different view with CreditRating, therefore, they should be both features in this prediction model. Moreover, CreditScoreRevision has moderate relationship with BankcardUtilization, AvaliableBankcardCredit and TradesNeverDelinquent. Have not found the features that are not related to CreditRating but have relationship with BorrowerAPR, which makes sense, since all the score data (i.e., CreditRating, ProsperScore, CreditScoreRevision) has already taken account of all the history data.

I am still confused, how to get the CreditRating value? Calculated with history data, which includes payment history, credit utilization, length of creit history, new credit, credit mix etc? how to get the ProsperScore? Why so many score data in this data set?

Lots of questions have not been answers, need more clues to answer these questions. In other words, still not clear for part of data in the dataset. This part information can not be got by exploring data, should contact the data collectors for more details.

In the future, will build the prediction model to predict the BorrowerAPR. Will split the dataset to training data and testing data, build one model using Machine Learning Algrithms.